NetEase Automatic Chinese Word Segmentation
نویسندگان
چکیده
This document analyses the bakeoff results from NetEase Co. in the SIGHAN5 Word Segmentation Task and Named Entity Recognition Task. The NetEase WS system is designed to facilitate research in natural language processing and information retrieval. It supports Chinese and English word segmentation, Chinese named entity recognition, Chinese part of speech tagging and phrase conglutination. Evaluation result shows our WS system has a passable precision in word segmentation except for the unknown words recognition.
منابع مشابه
Word Boundary Information and Chinese Word Segmentation
Chinese word segmentation could be considered as a problem of word boundary recognition. Word boundary information plays a significant role in human language acquisition and automatic segmentation for Natural Language Processing (NLP). Extraction of word boundary information involves cognitive psychology, computational linguistics, and language education. Methods utilizing word boundary informa...
متن کاملA Comparative Study of the Effect of Word Segmentation On Chinese Terminology Extraction
Automatic term extraction is the first step towards automatic or semi-automatic update of existing domain knowledge base. Most of the researches applied word segmentation as a preprocessing step to Chinese term extraction. However, segmentation ambiguity is unavoidable, especially in identifying unknown words for Chinese. In this paper, we discuss the effect and limitations of segmentation to C...
متن کاملA Web-based Approach To Chinese Word Segmentation
Chinese text processing requires the detection of word boundaries. This is a non-trivial step because Chinese does not contain explicit whitespace between words. Existing word segmentation techniques make use of precompiled dictionaries and treebanks. The creation of dictionaries and treebanks is a labor-intensive process and consequently they are updated infrequently. Furthermore, due to their...
متن کاملIdentification of Chinese Personal Names in Unrestricted Texts
Automatic identification of Chinese personal names in unrestricted texts is a key task in Chinese word segmentation, and can affect other NLP tasks such as word segmentation and information retrieval, if it is not properly addressed. This paper (1) demonstrates the problems of Chinese personal name identification in some IT applications, (2) analyzes the structure of Chinese personal names, and...
متن کاملSemi-automatic Annotation of Chinese Word Structure
Chinese word structure annotation is potentially useful for many NLP tasks, especially for Chinese word segmentation. Li and Zhou (2012) have presented an annotation for word structures in the Penn Chinese Treebank. But they only consider words that have productive affixes, which covers 35% of word types in that corpus. In this paper, we propose a linguistically inspired annotation that covers ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006